Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

For each term in Eq. 3.112, we have:

∂LS

∂k^l,i

= ^∂L^S

∂^ˆk^l,i

∂(w^l◦k^l,i

n ⁾

∂(w^l◦k^l,i

n ⁾

∂k^l,i

= ^∂L^S

∂^ˆk^l,i

◦1−1≤wl◦kl,i

n ^≤¹^◦^w^l^,

(3.113)

∂LB

∂k^l,i

= λ{w^l◦

w^l◦k^l,i

n ⁻^ˆ^k^l,i

+ ν[(σ^l

i⁾⁻²^◦⁽^k^l

i+ ⁻^μ^l

i+⁾

+ (σ^l

i⁾⁻²^◦⁽^k^l

i−⁺^μ^l

i−^)]^,

(3.114)

where 1 is the indicator function that is widely used to estimate the gradient of nondiﬀer-

entiable parameters [199], and (σ^l

i⁾⁻²^{is a vector whose elements are all equal to (}^σ^l

i⁾⁻²^.

Updating w^l: Unlike the forward process, w is used in backpropagation to calculate the

gradients. This process is similar to the way to calculate ˆx from x asynchronously. Speciﬁ-

cally, δwl is composed of the following two parts:

δwl = ^∂L

∂w^l⁼^∂L^S

∂w^l⁺^∂L^B

∂w^l^.

(3.115)

For each term in Eq. 3.115, we have:

∂LS

∂w^l⁼

i=1

NIl

n=1

∂LS

∂^ˆk^l,i

∂(w^l◦k^l,i

n ⁾

∂(w^l◦k^l,i

n ⁾

∂w^l

i=1

NIL

n=1

∂LS

∂^ˆk^l,i

◦1−1≤wl◦kl,i

n ^≤¹^◦^k^l,i

n ^,

(3.116)

∂LB

∂w^l⁼^λ

i=1

NIl

n=1

(w^l◦k^l,i

n ⁻^ˆ^k^l,i

n ⁾^◦^k^l,i

n ^.

(3.117)

Updating μ^l

i ^and^σ^l

i^{: Note that we use the same}^μ^l

i ^and^σ^l

i ^{for each kernel (see}^Section

3.2). So, the gradients here are scalars. The gradients δμl

i ^and^δ^σ^l

i ^{are calculated as:}

δμl

i ⁼^∂L

∂μ^l

= ^∂L^B

∂μ^l

λν

C^l

i ^×^H^l^×^W^l

C^l

n=1

H^l×W ^l

p=1

(σ^l

i⁾⁻²⁽^μ^l

i ⁻^k^l,i

n,p⁾^,

k^l,i

n,p ^≥⁰^,

(σ^l

i⁾⁻²⁽^μ^l

i ⁺^k^l,i

n,p⁾^,

k^l,i

n,p ^<⁰^,

(3.118)

δσl

i ⁼^∂L

∂σ^l

= ^∂L^B

∂σ^l

λν

C^l

i^×^H^l^×^W^l

C^l

n=1

H^l×W ^l

p=1

−(σ^l

i⁾⁻³⁽^k^l,i

n,p⁻^μ^l

i⁾²⁺⁽^σ^l

i⁾⁻¹^,k^l,i

n,p ^≥⁰^,

−(σ^l

i⁾⁻³⁽^k^l,i

n,p⁺^μ^l

i⁾²⁺⁽^σ^l

i⁾⁻¹^,k^l,i

n,p ^<⁰^,

(3.119)

where k^l,i

n,p^{, p}^∈{¹^{, ..., H}^l^×^W^l^}^{, denotes the}^p^{-th element of}^k^l,i

n ^{. In the ﬁne-tuning process,}

we update cm using the same strategy as center loss [245]. The update of σm,n based on

LB is straightforward and is not elaborated here for brevity.